Skip to content

Add ocean basin labeling and masking framework from ClimaOcean#140

Open
simone-silvestri wants to merge 15 commits intomainfrom
ss-js/update-from-climaocean
Open

Add ocean basin labeling and masking framework from ClimaOcean#140
simone-silvestri wants to merge 15 commits intomainfrom
ss-js/update-from-climaocean

Conversation

@simone-silvestri
Copy link
Copy Markdown
Member

@simone-silvestri simone-silvestri commented Mar 24, 2026

Summary

ClimaOcean has developed a basin labeling and masking framework that enables identifying and isolating individual ocean basins (Atlantic, Pacific, etc.) from bathymetry data. This is needed for basin-specific diagnostics, regional restoring, and inpainting. Meanwhile, NumericalEarth had independently added a bathymetry caching system to avoid recomputing regridded bathymetry on every call.

This PR reconciles the two codebases by porting ClimaOcean's labeling advances into NumericalEarth while preserving the caching infrastructure, preparing for CliMA/ClimaOcean.jl#757 which will have ClimaOcean depend on NumericalEarth's Bathymetry module for this functionality.

Changes

  • Add Barrier type for geographically separating connected ocean basins (e.g. closing Drake Passage to isolate the Atlantic)
  • Add label_ocean_basins with support for periodic and tripolar grid boundary conditions
  • Add OceanBasinMask with predefined barriers and seed points, plus convenience functions: atlantic_ocean_mask, indian_ocean_mask, pacific_ocean_mask, southern_ocean_mask, arctic_ocean_mask
  • Refactor remove_minor_basins! to delegate to label_ocean_basins instead of doing its own internal labeling
  • Preserve the existing bathymetry caching infrastructure (BathymetryRegridding, JLD2 cache)
  • Add tests for barrier geometry, basin labeling with/without barriers, and OceanBasinMask creation

Test plan

  • Existing bathymetry construction, smoothing, and caching tests still pass
  • Barrier constructors (explicit, keyword, meridional with width)
  • label_ocean_basins without barriers produces a single connected ocean
  • label_ocean_basins with barriers separates basins correctly
  • OceanBasinMask for the Atlantic excludes Pacific cells

simone-silvestri and others added 2 commits March 24, 2026 11:24
Port the refactored labeling framework from ClimaOcean while preserving
the existing bathymetry caching infrastructure. Adds Barrier type for
basin separation, label_ocean_basins with periodic/tripolar support,
OceanBasinMask with predefined seeds and convenience functions for
Atlantic, Indian, Pacific, Southern, and Arctic oceans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@simone-silvestri simone-silvestri requested review from glwagner and navidcy and removed request for glwagner March 24, 2026 10:28
Comment thread src/Bathymetry/label_ocean_basins.jl Outdated
Comment thread src/Bathymetry/label_ocean_basins.jl Outdated
Comment thread src/Bathymetry/label_ocean_basins.jl Outdated
Comment thread src/Bathymetry/ocean_basin_mask.jl Outdated
Comment thread src/Bathymetry/ocean_basin_mask.jl Outdated
Comment thread src/Bathymetry/ocean_basin.jl
Comment thread src/Bathymetry/ocean_basin.jl
Comment thread src/Bathymetry/ocean_basin_mask.jl Outdated
Comment thread src/Bathymetry/ocean_basin_mask.jl Outdated
Comment thread src/Bathymetry/label_ocean_basins.jl Outdated
@glwagner
Copy link
Copy Markdown
Member

Can these tools also be used to identify lakes, or any enclosed body? If so "ocean basin" may not be the right word.

@simone-silvestri
Copy link
Copy Markdown
Member Author

thanks for the review, I ll go through the comments one by one.

@simone-silvestri
Copy link
Copy Markdown
Member Author

I think all the review comments have been addressed:

Changes made:

  1. Barrier vs BoundingBox — Added a Note: section to the Barrier docstring explaining the conceptual difference (Barrier blocks water cells for labeling; BoundingBox selects spatial
    data subsets)
  2. maybe_extend_longitude → copy_periodic_longitude — Renamed as suggested
  3. Nx → Nλ, nx → half — Renamed to avoid confusion with grid's Nx
  4. Large dataset concern — Added comment: "This is an O(Nλ × Nφ) CPU allocation, acceptable for the serial labeling step"
  5. OceanBasinMask → BasinMask — Renamed throughout (struct, exports, tests, docstrings)
  6. "Earth" naming — Updated convenience function docstrings to say "Earth's Atlantic Ocean" etc., and section header to "Convenience functions for Earth's ocean basins"
  7. Mask Bool instead of float — Changed all Field{Center, Center, Nothing}(grid) mask allocations to Field{Center, Center, Nothing}(grid, Bool); updated + operator to use .| instead
    of max.
  8. Docstring for seed_points — Added explanation to the BasinMask struct docstring and expanded the constructor docstring
  9. "connected component analysis" clarified — Rewrote the constructor docstring to reference ImageMorphology.label_components explicitly and describe the algorithm plainly
  10. Seed points / barriers interaction clarified — Added explanation of how multiple seeds (tried in order, first hit wins) and multiple barriers (applied simultaneously) work
  11. southern_ocean_mask alignment — Fixed indentation of keyword args to align with the function name

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 72.67442% with 47 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/Bathymetry/ocean_basin.jl 56.41% 34 Missing ⚠️
src/Bathymetry/label_ocean_basins.jl 84.21% 12 Missing ⚠️
src/Bathymetry/regrid_bathymetry.jl 94.44% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@glwagner
Copy link
Copy Markdown
Member

glwagner commented Apr 1, 2026

Sorry to be so annoying about Barrier -- but can we simply use BoundingBox for the same purpose as barrier? They seem to carry the exact same information and sound very similar. I understand they are used for different purposes in different contexts, but that doesn't matter. (It would only matter if we have to support both barrier and bounding box at the same time, then we need them distinct)

Comment thread src/Bathymetry/label_ocean_basins.jl Outdated
"""
Barrier{W, E, S, N}

A rectangular geographic region used to separate connected water regions during labeling.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bounding box literally represents a rectangular geographic region

Comment thread src/Bathymetry/label_ocean_basins.jl Outdated
Comment on lines +96 to +97
zb_data = zb_cpu.data[1:Nλ, :, 1]
zb_parent = zb_data.parent
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reads weird. what is the type of zb_data here? should we be using view(zb_cpu, 1:N, :, 1)? I thought that the getindex would allocate a new array.

either way, def better to use parent(zb_data) to be more future proof.

Comment thread src/Bathymetry/label_ocean_basins.jl Outdated

# Concatenate a copy of the eastern half on the left and the western half on the right.
# This is an O(Nλ × Nφ) CPU allocation, acceptable for the serial labeling step.
zb_parent = vcat(zb_parent[half:Nλ, :], zb_parent, zb_parent[1:half, :])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giordano has showed cases where rebinding a name (eg reusing the name zb_parent) produces compiler problems, so you probably need to introduce a new name like concatenated_zb

Comment thread src/Bathymetry/label_ocean_basins.jl Outdated
Comment thread src/Bathymetry/label_ocean_basins.jl Outdated
# If barriers are specified, apply them to a copy of the bathymetry
if !isnothing(barriers)
# Create a temporary field with the modified bathymetry
zb_modified = Field{Center, Center, Nothing}(cpu_grid)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could probably also use similar(zb) -- may be more robust to different bathymetry types

cpu_grid = on_architecture(CPU(), grid)

TX = topology(cpu_grid, 1)
zb = cpu_grid.immersed_boundary.bottom_height
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is specific to GridFittedBottom and PartialCellBottom. perhaps it will work when we have "top" partial cells too and cut cells?

Comment thread src/Bathymetry/ocean_basin_mask.jl Outdated
Comment thread src/Bathymetry/ocean_basin_mask.jl Outdated
Comment thread src/Bathymetry/ocean_basin.jl
Comment thread src/Bathymetry/ocean_basin_mask.jl Outdated
Comment thread src/Bathymetry/ocean_basin_mask.jl Outdated
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
Comment thread test/test_bathymetry.jl
Copy link
Copy Markdown
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm being annoying, but it's probably nice to get this right...

  1. Can we use BoundingBox to represnet barriers?
  2. BasinMask may be wrong name, you may want "Basin" so the mask is mask = basin.mask
  3. There's a lot of F64 baked in, it looks dirty to me but maybe has a purpose?
  4. Also lots of broadcasting and CPU assumption which has caused issues in the past. Do we need this to work on GPU ever? Perhaps not, but if we have like 40m bathymetry global, would it fail? I'm also ok to merge as is and work on clean up later, if it works.
  5. I think there is also an assumption about the type of immersed boundary grid --- the main thing I think we want to consider is supporting ice shelves / top partial cells, which hopefully will come soon.

@simone-silvestri
Copy link
Copy Markdown
Member Author

I should have answered all the comments. I think this is ready to review again.

@glwagner
Copy link
Copy Markdown
Member

I should have answered all the comments. I think this is ready to review again.

The comments may have been addressed in the code but are unanswered in text. Can you please respond to my comments and say what was done to address them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants